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ON THE INSTABILITY OF GRAMMATICALLY JUDGMENTS 
AAAL, 1988 December 
David Birdsong 
\ ■ J University of Florida 

The pros and cons of using linguistic intuitions as a data base for theory have been 
argued for decades. Early participants in this debate included such luminaries as Hill 
(1961), Chomsky (1964, 1965), Labov (1975), and Bolinger (1968). More recently, the 
pros have been presented admirably by Newmeyer (1983). The cons on ue other hand are 
most saliently illustrated by the fact that two years ago the editors of Linguistic Inquiry 
declared a moratorium on papers on the so-called "contraction" debate, in part because 
the principals couldn't agree on whether sentences like (1) on your handout (Who do you 
wanna do it? ) are grammatical or not. [that is, whether want to could be contracted to 
wanna in such contexts] 

At the root of the debate is the seeming capriciousness of grammaticality 
judgments. Not only do speakers routinely not agree, it is frequently the case that 
individuals' judgments of sentence grammaticality can vary from one elicitation to the 
next. For example, in a recent experimental study, Nagata (1988) found that 
grammaticality judgments for isolated sentences tend to change when subjects are asked to 
repeat them or to embed them in realistic contexts. Nagata has also documented variable 
ffects of context on the rating of grammatical versus ungrammatical strings. Among other 
studies which attest to intra-subject inconsistency or instability are Carden 1970; Snow 
& Meijer, 1977; Birdsong, (1989) [see also Carroll, Bever & Pollack, 1981, for 
demonstration of instability on a different metalinguistic task]) 

In the face of such phenomena, many mainstream generative linguists have aligned 
themselves with psychologists and have concluded, to quote Lasnik (1981), who 
paraphrases Peters and Ritchie (1973), that "grammaticality judgments are often 
incorrectly considered as direct reflections of [linguistic] competence. ... responding to a 
grammaticality query is an instance of [metalinguistic performance (p. 20). (brackets 
mine; italics Lasnik's). [this quote is given in (2) on the handout] 

In relegating grammaticality judgments to the realm of performance, theorists 
acknowledge how dirty the data are. What if anything can be done with these data is the 
next question. If they are to be used, even peripherally, to inform theory, then it 
^ behooves us to fellow the urgings of Ca^oll, B ver & Pollack (1981), Levelt (1974), and 

<5[ others and try to understand more about the psychology of this type o^ metalinguistic 
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performance. In other words, if grammaticality judgments are not simply mirrors of 
linguistic knowledge, what are they? 

In the present paper, I pursue a tentative and partial answer to this question. To 
that end, I will outline elements of a psychological model of speakers' performance on 
grammaticality judgment tasks. This model, which departs in significant ways from 
notions of metalinguistic performance outlined in Bialystok & Ryan (1985) Newmeyer 
(1983), and Birdsong (1989), is derived principally from recent work in category theory 
by Barsalou (1987). A distinction will be made between categorical knowledge and certain 
ad hoc and contextually-variable concepts which people use to represent such presumed 
knowledge. Particular emphasis will be placed on accounting for the frequently-attested 
instability of grammaticality judgments. The arguments and evidence to be presented 
should be considered as initial gropings toward an eventual coherent picture of 
performance on grammaticality judgment tasks, and of the relationship of this 
performance to linguistic knowledge. 

I would like to start by introducing the main features of Barsalou's theory of 
categorization. Although I won't be able to get into the subtleties of the model, those of 
you who are familiar with George Lakoffs work will perceive areas of overlap with and 
divergence from Barsalou's thinking. This is not the proper time to compare the two; 
instead I refer interested members of the audience to a collection of papers edited by 
Neisser (1987), in particular the chapters by Lakoff, Barsalou, and McCauley. (All these 
references are given in the bibiography on your handout.) As a further note of 
clarification, I would like to point out that Barsalou and I diverge in our relative 
emphases, as well as in our articulation of certain theoretical details; I apologize in 
advance if time constraints prevent n.e from specifying all such divergences. Again, I 
urge those of you who are interested to read the Barsalou, Lakoff, and McCauley papers to 
get a sense of these thinkers' contributions to the model I discuss today. 

First, a bit of terminology. Barsalou makes a critical distinction between category 
and concept. Categories are cognitive structures which may have either finite or infinite 
membership or exemplars. Some catego-ies may be formal (e.g., ODD NUMBERS, SQUARES), 
while others may be goal-derived (e.g., THINGS TO TAKE ON A CAMPING TRIP). Still 
others may not be established in memory, and indeed are rarely if ever thought about. 
Barsalou has shown empirically, for example, that people are able to create and 
manipulate such categories as WAYS TO ESCAPE BEING KILLED BY THE MAFIA and 
THINGS THAT COULD FALL ON YOUR HEAD. Categories typically display gradedness or 
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prototype effects, such that some members of a category are perceived in performance 
contexts as better exemplars of that category than others. Barsalou, like Lakoff and Rosch, 
insists that the graded structure of categories is a behavioral effect, i.e., how people in 
performance situations order exemplars in categories according to typicality or goodness- 
of-example. In this regard I direct your attention to (3) which is something of a synthesis 
of conclusions by Rosch, Lakoff, and Barsalou: 

"Behavioral effects are not to be interpreted as a direct reflection of cognitive structure 
(in this case, category membership). Categories are NOT represented in the mind in terms 
of prototypes or best examples..Prototype effects are evidence that subjects can judge 
degree of prototypicality, not evidence of mental representation of categories" 

Thus, for example, the finding of Armstrong, Gleitman & Gleitman (1983) that subjects 
consider 703 not as good an exemplar of the category odd number as 9 is not to be taken to 
mean that the nominal or binary category "odd number" is represented cogni ively as 
scalar or graded 

In the Barsalou scheme, a concept is invoked or constructed tc index (or, in 
Barsalou's terminology, represent ) a category. Thus, having wings is a concept that 
represents the category of BIRD. Barsalou rejects the classical association of concepts 
with defining properties, distinctive features, or criteria for membership. Instead, the 
term concept refers to particular information used to represent a category on a particular 
occasion. That is, concepts are not necessarily invariant or stable. More precisely, the 
concept contains information that provides relevant expectations about the category in a 
given context as well as information about that category in most contexts, [examples to 
follow] Among Barsalou's reasons for characterizing concept in this way is the fact that 
defining properties for categories are often not available. Take for example the category 
of MOTHER. In these days of adoption, surrogate motherhood, test-tube conception, and so 
forth, facile concepts of motherhood are inadequate. The concept "birth-giver" doesn't 
always work, because there are adoptive and foster mothers. Even "female" fails as an 
invariant concept since there are females who have given birth and have since had a sex- 
change operation. (More on this, if you're interested, in Lakoff, and with a different 
theoretical spin). And even when criterial definitions for categories do exist, Barsalou 
argues on the basis of empirical evidence that such definitions do not always operate in 
all people's representations of categories. To illustrate his point, Barsalou cites the 
category ANIMAL, as seen through the eyes of housewives and rednecks (by the way, the 
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possibly offensive stereotypes are Barsalou's) Barsalou argues that the housewife point of 
view may generate a concept for animals that includes information about animals being 
small and domesticated. The redneck on the other hand might use a concept that includes 
information about animals being large and/or wild. Thus, a Pekingese may be judged more 
animal-like by a housewife, but judged a poorer exemplar of the category ANIMAL by the 
redneck. The variability of concepts is also documented in a variety of ad hoc and 
context-dependent behaviors. For example, Barsalou notes that the concept "floating" is 
not normally associated with the category BASKETBALL. However, when subjects are told 
that someone in a boating accident used a basketball as a life preserver, the concept 
"floating" becomes activated as a concept. 

With these metacognitive behaviors, there are abundant parallels in judgments of 
sentence grammaticality. We've probably all been in heated discussions where 
assessments of grammaticality by theoretical fiat are challenged by skeptics who are able 
to concoct a shaggy-dog story, the conclusion of which is a nominally starred sentence 
which, embedded within rich layers of context, is now unobjectionable As McCawley (1985 
-with a "w") points out, linguists who offer their intuitions often are not grappling so 
much with questions of grammaticality but rather rep rting their success in imagining a 
context where the sentence in question would sound OK. For example, sentences like (4) / 
shaved me are ungrammatical, as binding principles make the -self affix obligatory. 
However, Bolinger (1968) notes a context where -self is not obligatory, and indeed is 
prohibited. In a 1959 movie called "Rally 'round the flag, boys", the main character, who 
has just spilled perfume on himself, is told by another character "I don't see how you can 
resist you." [(5) on the handout] Bolinger explains that the reflexive construction not 
only means 'X operates on X", but that it also implies that X must be interpreted as an 
indissoluble entity. If for some reason a speaker wants to suggest a dissociation of an 
entity from itself, this can be done merely by avoiding the reflexive marker. Clearly, for 
certain discourse purposes, the putatively obligatory or nominally "grammatical" 
construction is not. 

Grammaticality judgments may be unstable because of this type of consideration 
and numerous others. As a common example, an individual may reject sentences like (6), 
with straiided prepositions and improper case marking, on one occasion and accept them 
on the next, depending on how prescriptivist one's current concepts which represent the 
category of well-formedness. 



r 

I would now like to flesh out these skeletal and anecdotal observations in a 
somewhat more formal fashion. In (7, 8, & 9) on the handout, I summarize and schematize 
with a few salient examples from both "real-world" and linguistic domains the elements of 
categorization and concept formation and application that I've been discussing. I'll walk 
us through these examples in turn. 

[these are my examples, not Barsalou'ej 

In (7), the category BIRD is instantiated by a subordinate category, "swallow". 
Ordinarily, we bring to a category what are typically considered criterial or defining 
features. In the case of this subcategory, e.g., "winged animal" and "belonging to the 
Hirundinidae family". We may also summon a variable concept, here, "capable of flight". 
This concept is of necessity variable, since would apply to the (sub)category swallow, but 
not to the (subcategory penguin. With these and possibly other concepts we are able to 
accurately categorize or judge a given bird to be a swallow or not. However, Barsalou 
argues that ostensibly invariant concepts are not invariant at all, but rather analytic 
fictions. What if, for example, a candidate creature were born wingless, or had had its 
wings removed. Would it automatically be excluded from consideration as a possible 
swallow? Would it be less of a swallow? Would it be graded, say, a 7 on a 10-point scale 
of swallowness? In other words, is alate (or "wingfulness") truly a defining feature? 
Obviously, the same logic could be applied to the concept of forked tail. And the same is 
true of the notion of regular migration. Suppose that at some point in the future the 
famous Capistrano swallows didn't return to the monastery on the right day, or at all. 
Would they then cease to be swallows? Moreover, even the family membership concept may 
be variable. It so happens that certain members of the family Micropodidae (tnat is, the 
swift family [no pun]), which resemble swallows in salient anatomical respects, are 
commonly referred to as shallows by ornithologists and casual birdwatchers alike. The 
fact that a taxonomic assignment places these birds in the swift family does not keep 
people from judging them to be swallows. 

In (8), the category RECTANGLE is instantiated by the subordinate category 
"square". The so-called invariant concept which supposedly represents this 
(sub)category is "plane figure with 4 sides of even length joined at right angles." But 
when is a square not a square? Let's suppose we supplied a context of convergent lines 
intersecting a square, as in (8a). Is this a ^uare? Typically, it is not perceived and 
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judged to be a square. Suppose further that a different set of circumstances obtained, 
whereby two congruent right isosceles triangles were placed together along the length of 
their hypotenuses. The resultant figure could be, by definition, a square - or - two 
triangles. For it to be judged a square, an individual would have to invoke, or indeed, 
CIfial£» a variable concept, namely, "two congruent right tiiangles isosceles joined along 
the length of their hypotenuses". A considerable amount of variability across subjects 
2U*d instability within subjects would likely obtain if one manipulated contextual 
features. For example, in reference to 8b on the handout, if the triangles were originally 
separated and rotated, and then manually joined to yield the square, subjects might be 
less inclined to judge the figure a square than under a condition where the triangles were 
already joined, [note parallel to diachronic and synchronic notions of grammaticality in 
language] And similarly, referring to 8b and 8c, the perceived squareness of the figure 
may vary, depending on whether the diagonal line representing the triangles' hypotenuses 
is prominentias opposed to a condition whereby the diagonal is obscured or not present 
at all. 

Such everyday perceptual effects are of course legion. In 8e, for example, the 
figure on the left is clearly a circle. Some would say, however, that the figure on the -ight 
is not a circle, but a sphere. 

So much for the real world. Now for the linguistic domair. In (9) on your handout, 
the superordinate category SENTENCE (or WELL-FORMEDNESS) is instantiated by a 
subordinate category of sentences where NFs have been extracted. As you know, such 
extractions are regulated by what Chomsky in 1964 called the A-over-A principle, and 
later reformulated under the rubric of the subjacency condition. Presumably, this 
concept and others, such as rules of phrase structure and core parameters are brought to 
the task of judging well-formedness (i.e., membership in the category of SENTENCE). 
Thus, for example, we would know that a string such a* (10) A large fell on my car is 
ungrammatical, since a NOUN is obligatory in an Na\ Similarly, we would know by this 
and by invoking avatars of the A-over-A principle, that strings like (11) What are you 
ccokin' on a hoi? are ungrammatical. Indeed, normally this sentence would not even 
generate discussion among native adult speakers of English; it's not borderline, it's just 
bad. And such a sentence would be bad in any language, since it violates putative 
universal constraints on the form of natural languages. However, those of you who have 
read a recent article in Language by Bob Wilson and Ann Peters [the title of the article is 
the sentence in (11)] - you know that just this sentence and others like it were produced 



by Seth, the young son of the first author (and also, incidentally, by the 7-1/2 year old 
son of Barbara Partee). 

I won't go into details of how Seth came to produce such sentences. It is a 
compelling story of how a child's congenital blindness, frequent linguistic games with the 
father, and the father's exceptional didactic routines seem to have conspired to engender a 
theoretical anomaly. Suffice it to say that within *his youngster's developing grammar, 
sentences like What are you cookin' on a hot? are permissible. Mind you, they're not 
permitted in mx grammar or in grammars of natural languages. BUT, in the course of 
reading and rereading this fascinating article, my opinion of such sentences began to 
change. While I still recognized their ill formedness, they didn't seem as bad to me as 
they once had. Before reading about Seth, I would have placed such sentences among the 
most aberrant I'd ever heard. Now I'm more lenient. In oiher words, the "context" if you 
will of Seth's story introduced instability into my judgments of grammatically. By the 
way, I am not alone in this sentiment-several of my colleagues and students have 
experienced similar reactions after reading the Wilson and Peters article. Words to the 
effect, "Hey, those sentences don't sound so bad any more." In terms of Barsalou's 
framework, my presumably invariant concept of A-ov^r-A or the subjacency condition is 
not strictly applied. Instead, my judgment of grammatically appeals to a variable 
concept, one that derives from extraordinary contextual priming-the story of Seth. As a 
result, the sentence is not nominally ungrammatical in my judgment, merely far down the 
scale. 

[not talking about fuzzy grammar here, but behavioral effects, see (3) on handout] 



A couple of less exotic or idiosyncratic examples may more effectively suggest the 
applicability of Barsalou's scheme to grammatically judgments. In (12) and (13) on the 
handout are sentences that illustrate constraints on movement across PP Islands. With 
them are shown corresponding judgments of grammaticality by English native control 
subjects in a second-language acquisition study by Bley-Vroman, Felix, and Ioup. In this 
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study, the stimulus sentences were presented in pairs. In the first pairing, given in (12) 
both the A string and the B string were correctly judged by 79% of the subjects, while 18% 
of the subjects said that both A and B were good, and 3% got the contrast backwards. In 
the second pairing, given in (13), a very dif r *rent pattern was observed. Only 53% got the 
contrast correct, while 44% said both sentences were OK. Though subjects were not asked 
to explain their responses in a "Think-Aloud" verbal report protocol k la Ericcson & 
Simon (1984), a reasonable post hoc hypothesis presents itself. Linguistically naive 
subjects who glance at the two sditences in (12) might perceive a certain structural 
similarity, namely, what are called dangling prepositions in grammar school. If they look 
at the two sentences in (13), however, there is more than a slight similarity. Indeed the 
sentences are superficially nearly identical: the constituents seem to match up almost 
one-for-one. A reasonable guess is that, once a subject has accepted the A sentence in 
(13), it is a trivial matter of analogical patterning to go ahead and accept (B), thus raising 
the poportion of "both OK" responses. That is, the dual considerations of (A's) 
acceptability and its superficial resemblance to (B) yield an ad hoc concept of what is a 
grammatical sentence and what is not. 

In the that-Uace examples in (14) and (15), the instability of judgments is once 
more documented, though the effect is somewhat weaker. This time the UNgrammatical 
exemplars are identical. Again, we cannot read the minds of the respondents, but I invite 
you to try. 

Barsalou's framework can also be applied to anomalous findings in Boutet's (1986) 
study of 6-to-ll year -old native French speaking children. The instructions given to 
these subjects were "DIs-moi pour toi $a fait une phrase" (Tell me [if] for you this makes 
[i.e., is] a sentence 1 ). Curiously, nearly 30% of Boutet's subjects rejected the item in (16). 
Quand ta grand-mire arrivera-t-elle? Those subjects' explanation? The item is a 
Question, and therefore not a sentence. The generation of such a concept to represent 
well-formedness is not surprising, given that this was the only item in the corpus of 
stimuli that was an interrogative. Boutet's study is of further interest in that she 
dociments a number of variable concepts such as "the words make a sentence of a good 
length", "sounds OK if written in a telegram", "needs a comma", which are attested as 
varying both across and within subjects. These concepts are similar in their variability to 
those often invoked by adults when judging grammaticality-such as those suggested in 
(9): namely, parsability, euphony, semantic appropriateness, one sentence "feels" better 
than another, and so forth. 
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The arguments I have just presented are at best sketchy and speculative. They 
remain to be expanded to a broad range of judgments of differing sentence types and to a 
variety of elicitation conditions, and have yet to be elaborated to account for other 
features of metalinguistic performance besides instability of grammaticality judgments. 
In these regards, I might mention that seed money has been granted for a large-scale study 
of the metalinguistic behaviors of native and near-native speakers, which takes up where 
Coppieters' (1987) controversial and methodologically-suspect study left off. Coppieters 
argues that near natives whose linguistic performance is indistinguishable from that of 
natives nevertheless demonstrate competence differences, as suggested by divergent 
grammaticality judgments. The follow-up study investigates both linguistic (competence) 
and non-linguistic factors that determine speakers' decision-making routines. It is 
hoped that eventually researchers will be able to shed light on obscure procedural aspects 
of individuals' grammaticality judgment routines, and that this increased understanding 
will help smooth the conceptual and empirical kinks out of the torturous relationship 
between linguisitic knowledge and performance on grammaticality judgment tasks. 
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ON THE INSTABILITY OF GRAMMATICALITY JUDGMENTS 

AAAL Annual Meeting, 1988 
David Birdsong 
University of Florida 



1. Who do you wanna do it? 



2. Lasnik (1981: 20): "Gramm* ticality judgments are often incorrectly considered as direct 
reflections of [linguistic] competence. ... responding to a grammaticality query is an 
instance of [metalinguistic performance.* (brackets mine; italics Lasnik^) 

3. cf. Rosch, Lakoff, Barsalou: Behavioral effects are not to be interpreted as a direct 
reflection of cognitive structure (in this case, category membarship). Categories are NOT 
represented in the mind in te.ms of prototypes or best examples. Prototype effects are 
evidence that subjects can judge degree of prototypicality, not evidence of mental 
representation of categories. 

4. *l shaved me. 

5. I don't see how you can resist you. (Bolinger, 1968). 

6. Who are you talking with? 
7. 



8. 



CATEGORY: 
subordinate category: 
"invariant" concepts: 

variable concept: 

CATEGORY: 
subordinate category: 
-invariant" concept: 

variable concept: 



BIRD[iness] 
swallow 

winged animal; forked tail; regular migration; 

Hirundinidae family 
capable of flight 

RECTANGLE 
square 

plane figure with 4 sides of even length joined at 
right angles 

2 congruent right isosceles triangles joined along 
the length of their hypotenuses 
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8b. 



8c. 
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8d. 



80. 





9. CATEGORY: 

subordinate category: 
-"invariant" concepts: 



variable concepts: 



SENTENCE[hood] (aka WELL-FORMEDNESS) 
NP extracted strings 

(constrained) A-over-A principle; S -> NP + VP; 

NP -> N (det, AP); core parameters (e.g., PRO- 
Drop) 

OK given contextual info; easy to parse; euphonic; 
semantical!/ non-anomalous 



10. *A large fell on my car. 

1 1 . 'What are you cookin' on a hot? (Wilson & Peters, 1 988) 

(Examples & data in 12-15 from Bley-Vroman, Felix & loup, 1988) 

1 2. (A) Which bed does John like to sleep in? 

(B) 'What did Albert put money in the box during? 



Judgments (%) 



CORRECT 
79 



BOTH 
0 



BOTH OK 
1 8 



BACKWARDS 
3 



13. 



(A) Which bed does John like to sleep in? 

(B) 'What time will Mary arrive before? 



53 



44 



14. 



(A) What did Frank say that Judy would like to read? 

(B) 'What did John say that would fall on the floor, if we're not careful? 



44 



56 



15. 



(A) Who did Ellen say Mary thought would pass the test? 

(B) 'What did John say that would fall on the floor, if we're not careful? 



35 9 50 

16. Quand ta grand-mdre arrivera-t-elle? (Boutet, 1986) 
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